Aug, 7, 2019

Overview

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.

You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

It’s hard to succinctly describe how ggplot2 works because it embodies a deep philosophy of visualisation.

However, in most cases you start with ggplot(), supply a dataset and aesthetic mapping (with aes()).
You then add on layers (like geom_point() or geom_histogram()), scales (like scale_colour_brewer()), faceting specifications (like facet_wrap()) and coordinate systems (like coord_flip()).

1. The Setup

Source : http://r-statistics.co/ggplot2-Tutorial-With-R.html

diamonds
## # A tibble: 53,940 x 10
##    carat cut       color clarity depth table price     x     y     z
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
##  2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
##  3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
##  4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
##  5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
##  6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
##  7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
##  8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
##  9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
## 10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39
## # ... with 53,930 more rows
ggplot(diamonds)  #if only the dataset is known.

ggplot(diamonds, aes(x=carat))  # if only X-axis is known. The Y-axis can be specified in respective geoms.

ggplot(diamonds, aes(x=carat, y=price))  # if both X and Y axes are fixed for all layers.

ggplot(diamonds, aes(x=carat, color=cut))  # Each category of the 'cut' variable will now have a distinct  color, once a geom is added.

2. The Layers

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
  geom_point() + geom_smooth() # Adding scatterplot geom (layer1) and smoothing geom (layer2).
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

ggplot(diamonds) +
  geom_point(aes(x=carat, y=price, color=cut)) +
  geom_smooth(aes(x=carat, y=price, color=cut)) # Same as above but specifying the aesthetics inside the geoms.
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

ggplot(diamonds) +
  geom_point(aes(x=carat, y=price, color=cut)) +
  geom_smooth(aes(x=carat, y=price)) # Remove color from geom_smooth
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

ggplot(diamonds, aes(x=carat, y=price)) +
  geom_point(aes(color=cut)) + geom_smooth()  # same but simpler
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

3. The Labels

gg <- ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point() + labs(title="Scatterplot", x="Carat", y="Price")  # add axis lables and plot title.
print(gg)

4. The Theme

-Adjusting the size of labels : theme() function by setting the plot.title, axis.text.x and axis.text.y.
They need to be specified inside the element_text(). If you want to remove any of them, set it to element_blank() and it will vanish entirely.

-Adjusting the legend title : scale_color_discrete(), where the color part belongs to the color attribute and the discrete because the legend is based on a factor variable.

gg1 <- gg + theme(plot.title=element_text(size=30, face="bold"),
                  axis.text.x=element_text(size=15),
                  axis.text.y=element_text(size=15),
                  axis.title.x=element_text(size=25),
                  axis.title.y=element_text(size=25)) +
  scale_color_discrete(name="Cut of diamonds")  # add title and axis text, change legend title.
print(gg1)  # print the plot

based on a factor variable \(\rightarrow\) scale_shape_discrete(name="legend title")
based on a continuous variable \(\rightarrow\) scale_shape_continuous(name="legend title") instead.

5. The Facets

gg1 + facet_wrap( ~ cut, ncol=3)  # columns defined by 'cut'

gg1 + facet_wrap(color ~ cut)  # row: color, column: cut

gg1 + facet_wrap(color ~ cut, scales="free")  # row: color, column: cut

gg1 + facet_grid(color ~ cut)   # In a grid

6. Commonly Used Features

The ggfortify package makes it very easy to plot time series directly from a time series object, without having to convert it to a dataframe. The example below plots the AirPassengers timeseries in one step.

6.1 Make a time series plot (using ggfortify)

autoplot(AirPassengers) +
  labs(title="AirPassengers")  # where AirPassengers is a 'ts' object

6.2 Plot multiple timeseries on same ggplot

# Approach 1:
data(economics, package="ggplot2")  # init data
economics <- data.frame(economics)  # convert to dataframe
ggplot(economics) + geom_line(aes(x=date, y=pce, color="pcs")) +
  geom_line(aes(x=date, y=unemploy, col="unemploy")) +
  scale_color_discrete(name="Legend") + labs(title="Economics") # plot multiple time series using 'geom_line's

# Approach 2:
library(reshape2)
df <- melt(economics[, c("date", "pce", "unemploy")], id="date")
ggplot(df) + geom_line(aes(x=date, y=value, color=variable)) +
  labs(title="Economics")# plot multiple time series by melting

Multiple Y-axis on the same plot : Multiple time series on the same scale can make few of the series appear small. An alternative would be to facet_wrap it and set the scales='free'.

df <- melt(economics[, c("date", "pce", "unemploy", "psavert")], id="date")
ggplot(df) + geom_line(aes(x=date, y=value, color=variable))  +
  facet_wrap( ~ variable, scales="free")

6.3 ~ 6.5 Bar charts (Exercises!)

6.6 Adjust X and Y axis limits

There are 3 ways to change the X and Y axis limits.

  1. Using coord_cartesian(xlim=c(x1,x2))
  2. Using xlim(c(x1,x2))
  3. Using scale_x_continuous(limits=c(x1,x2))
  • Warning: 2., 3. will delete the datapoints that lie outisde the limit and 1. does not delete any datapoint. But instead it zooms in to a specific region of the chart.
ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
  geom_point() + geom_smooth() + coord_cartesian(ylim=c(0, 10000)) +
  labs(title="Coord_cartesian zoomed in!")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point() +
  geom_smooth() + ylim(c(0, 10000)) +
  labs(title="Datapoints deleted: Note the change in smoothing lines!")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 5222 rows containing non-finite values (stat_smooth).
## Warning: Removed 5222 rows containing missing values (geom_point).

6.7 Equal coordinates

Apart from the basic ggplot2 theme, you can change the look and feel of your plots using one of these builtin themes.

-theme_gray()
-theme_bw()
-theme_linedraw()
-theme_light()
-theme_minimal()
-theme_classic()
-theme_void()

The ggthemes package provides additional ggplot themes that imitates famous magazines and softwares.

ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point() +
  geom_smooth() +theme_bw() + labs(title="bw Theme")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

6.9 Legend - Deleting and Changing Position

By setting theme(legend.position="none"), you can remove the legend.
By setting it to ‘top’, you can move the legend around the plot.
By setting legend.postion to a co-ordinate inside the plot you can place the legend inside the plot itself.
The legend.justification denotes the the point that will be placed on the co-ordinates given by legend.position.

p1 <- ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
  geom_point() + geom_smooth() + theme(legend.position="none") +
  labs(title="legend.position='none'")  # remove legend

p2 <- ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
  geom_point() + geom_smooth() + theme(legend.position="top") +
  labs(title="legend.position='top'")  # at top

p3 <- ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point() +
  geom_smooth() + labs(title="legend.position='coords inside plot'") +
  theme(legend.justification=c(1,0), legend.position=c(1,0))  # inside the plot

gridExtra::grid.arrange(p1, p2, p3, ncol=3)  # arrange
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

6.12 Annotation

6.13 Saving ggplot

plot1 <- ggplot(mtcars, aes(x=cyl)) + geom_bar()
ggsave("myggplot.png")  # saves the last plot.
ggsave("myggplot.png", plot=plot1)  # save a stored ggplot